Part II - California Housing Price¶

by Timothy Adebisi¶

Investigation Overview¶

The overall goal of this presentation is to show how housing prices varies according to location and proximity to the ocean. The main features are housing_median_age (Years), median_income (USD), median_house_value (USD) and ocean_proximity.

Dataset Overview¶

The data analyzed is the California housing price dataset downloaded from kaggle. The dataset contains 20,640 observations and 10 features. The features are listed below:

  1. longitude
  2. latitude
  3. housing_median_age (Years)
  4. total_rooms
  5. total_bedrooms
  6. population
  7. households
  8. median_income (USD)
  9. median_house_value (USD)
  10. ocean_proximity
In [ ]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import plotly.express as px

%matplotlib inline

# suppress warnings from final output
import warnings
warnings.simplefilter("ignore")
In [6]:
# load in the dataset into a pandas dataframe
df = pd.read_csv('housing.csv')
df.sample(5)
Out[6]:
longitude latitude housing_median_age total_rooms total_bedrooms population households median_income median_house_value ocean_proximity
12032 -117.47 33.93 33.0 919.0 208.0 724.0 235.0 3.4028 110500.0 INLAND
10718 -117.83 33.65 8.0 2149.0 426.0 950.0 399.0 4.1103 250400.0 <1H OCEAN
2494 -120.19 36.60 25.0 875.0 214.0 931.0 214.0 1.5536 58300.0 INLAND
18368 -121.98 37.16 42.0 2533.0 433.0 957.0 398.0 5.3468 279900.0 <1H OCEAN
9973 -122.40 38.53 24.0 1741.0 289.0 564.0 231.0 3.6118 248400.0 INLAND
In [7]:
# Drop null values
df.dropna(axis=0, inplace=True)

# Change the datatype of some features from `float` to `int`
obs = ['housing_median_age', 'total_rooms', 'total_bedrooms', 'population', 'households']

for v in obs:
    df[v] = df[v].astype('int')

Income and House Value¶

There is a positive correlation between households income and house value as shown in the plot below:

In [9]:
# Scatter plot of house value and income
sb.scatterplot(data=df, x='median_income', y='median_house_value')
plt.xlabel('Income [Thousand USD]')
plt.ylabel('House Value [USD]')
plt.title('Income vs House Value');

Housing Age and House Value¶

The age of the House does not have any impact on the value placed on the house.

In [10]:
# Scatter plot of house age and house value
sb.scatterplot(data=df, x='housing_median_age', y='median_house_value')
plt.xlabel('Housing Age [Years]')
plt.ylabel('House Value [USD]')
plt.title('Housing Age vs House Value');

Housing Location, Price and House Value¶

The location of the houses have impact on the value of the house. The closer they are to the Waters, the higher the value

In [11]:
fig = px.scatter_mapbox(df,
                        lat='latitude',
                        lon='longitude',
                        center={'lat':37.09, 'lon':-121},
                        height=600,
                        width=600,
                        color='median_house_value',
                        hover_data=['ocean_proximity'])
fig.update_layout(mapbox_style='open-street-map', title='Housing Price and Location')
fig.show()

Generate Slideshow: Once you're ready to generate your slideshow, use the jupyter nbconvert command to generate the HTML slide show. . From the terminal or command line, use the following expression.

In [ ]:
!jupyter nbconvert Part_II_slide_deck_template.ipynb --to slides --post serve --no-input --no-prompt

This should open a tab in your web browser where you can scroll through your presentation. Sub-slides can be accessed by pressing 'down' when viewing its parent slide. Make sure you remove all of the quote-formatted guide notes like this one before you finish your presentation! At last, you can stop the Kernel.